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PREFACE 


ATA for this report are taken in part 
from research conducted at The 

Ohio State University under the auspices 
of the National Research Council Com- 
mittee on Selection and Training of Air- 
craft Pilots, with funds provided by the 


Civil Aeronautics Administration. 

Each of the authors is indebted to Dr. 
Floyd C. Dockeray of The Ohio State 
University for his valuable counsel and 
guidance in the designing and conduct- 
ing of the experiments. 
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FOREWORD 


NTIL comparatively recently there 
has been little agreement as to 

what constitutes a successful airplane 
pilot. Likewise, although there have been 
many attempts to devise methods for the 
selection of individuals who if given 
proper training would be able to pilot 
airplanes with some degree of com- 
petence, the field has been only partially 
explored. | 

Psychological research in pilot. selec- 
tion began during the first world war. An 
excellent review of the type of work done 
during that period is given by Dockeray 
and Isaacs (3). Much of the early work 
was of doubtful direct value because of 
lack of opportunity for validation in 
actual flight training circumstances, and, 
in some cases, imperfect design, lack of 
adequate data and improper statistical 
treatment (18). It was, however, a definite 
contribution and pointed out, in many 
cases, the direction for later research. 

After the cessation of hostilities inter- 
est waned, and by, 1939 Jenkins (6) re- 
ports that no psychologist was working 
in the field of aviation psychology. Re- 
search in the field was not revived until 
the present conflict became imminent. 
However, in the interim between the two 
wars, laboratory work in the field of psy- 
chology was done, some of which, when 
the occasion arose, proved to be of value 
in the selection of pilots, even though 
originally it had not been undertaken 
with this in view. However, as had been 
the case with a large part of the early 
pilot selection research, the value of 
much of this was not immediately known 
because of a lack of opportunity to vali- 
date the findings in actual flight training 
circumstances, 


It is recognized that the research done 
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in recent years on analysis of piloting has 
resulted in a large amount of pertinent 
information. Likewise, research _ per- 
formed on the isolation of the personal 
characteristics which are involved in fly- 
ing has yielded acceptable evidence as to 
their importance. Much of this informa- 
tion, however, has been related to mili- 
tary aviation as, for example, the work 
of Carlson (1) and Delucchi (2) where 
interest has centered in the selection of 
military pilots. With the exception of 
the research projects which have been 
done under the auspices of the National 
Research Council, Committee on Selec- 
tion and Training of Aircraft Pilots (25), 
few investigations of a psychological na- 
ture have been concerned with the prob- 
lems of the civilian aviator. In view of 
the widespread interest in private flying, 
and with the current liberal selection 
standards for candidates for the private 
pilot’s license, it seems more than ever 
advisable to examine the selection pro- 
cedures which are available. That this is 
a task for the psychologist has been ex- 
pressed well by Kellum (11) who says 
that the real problem of the selection of 
aviators begins after the physical exami- 
nation has been given. He points out 
that a large percentage of candidates who 
pass the physical examination eventually 
fail in flight training, and the reasons 
for failure are non-physical. 

_ One of America’s foremost investiga- 
tors in the field of aviation selection dur- 
ing the recent war, Liljencrantz (14), de- 
fined pilot selection as follows: 

The process of selection may be conceived 
as consisting of the administration of a test 
or a single group or battery of tests, on the 
basis of which a dependable decision can be 


reached as to an applicant’s aptitude for 
aviation. 
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Viii FOREWORD 


The abilities required in successful 
piloting are best described as a complex 
of coordinations, skills, and abilities. 
Therefore, adequate pilot selection 
would best be accomplished through a 
method which combined various meas- 
urements of the components of this com- 
plex. Thus far, no combination of selec- 
tion techniques has proved to be com- 
pletely valid, nor has any test battery by 
itself or when combined with other selec- 
tion techniques such as the informal 
interview or application blank, reached 
a point wherein its validity could not be 
improved. The question might be asked 


as to whether or not a test battery can 
be assembled to measure the various fac. 
tors involved in learning to fly and jf 
such a battery might be used to predic 
the success of candidates in learning to 
fly light airplanes. If any combination 
of selection tools can be made that serve 
the purpose of accurate prediction (of 
ultimate success or failure), this com. 
bination should prove to be more eco 
nomical and valuable to use than any 
one of the component predictors by it 
self, or than the selection procedures 
already in use. 
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STATEMENT OF THE PROBLEM 


‘T was on the basis of the foregoing dis- 
I cussion that the present research was 
undertaken. A study was planned, first of 
all, which would make it possible to 
analyze some of the factors involved in 
the determination of success in learning 
to fly light aircraft. Various tests were 
available which had already been shown 
to measure factors involved in learning 
to fly, and a new test was added. This test 
was designed to measure factors not con- 
sidered by any of the other tests. It was 
believed that these tests could be assem- 
bled into a battery not only which would 
measure some of the factors involved in 
learning to fly, but which would also be 
of some use in predicting success or fail- 
ure in flight training. It was also believed 
that it would be possible to show how 
much each of the factors measured by the 
tests contributed to the prediction of 
success or failure in learning to fly. 
Finally, it was believed that the predic- 
tive value of the test battery might best 
be examined if several types of criteria 
were used. It seemed possible that a test 
battery might have high predictive value 
for a criterion of success in specific 
maneuvers, but be worthless in the pre- 
diction of over-all success or failure. 


SUBJECTS USED IN THE INVESTIGATION 


Thirty-seven male subjects between the 
ages of 17 and 29 years were used in this 
investigation. All were enrolled in the 
National Research Council flight train- 
ing program at the Ohio State Univer- 
sity, and received their flight training be- 
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tween the middle of October, 1945, and 
the first of March, 1946. Twenty-nine 
men were either enrolled in college, or 
had received bachelor’s degrees and had 
research positions on the campus. 


SELECTION TESTS USED IN THE PRESENT 
INVESTIGATION 


All applicants for flight training under 
the experimental program were given a 
battery of tests. These tests were as 
follows: 


1. The Self-Administering Test of Mental 
Ability 
(Gamma A. M. Otis quick-scoring) 
2. The Ohio State Psychological Examina- 
tion 
(Form 22) 
3. Test of Aviation Information (Form P) 
4. Biographical Inventory (Form 2C 
Civilian Key) 
. Test of Mechanical Comprehension 
(Form B, CAA) 
. Desire to Fly (Form XPA) 
. Mashburn Serial Reaction Test 
. Two-Hand Coordination Test 
. Judgment-Reaction Test 


In the selection of tests for inclusion in 
this predictor battery, an attempt was 
made to include tests which were easy 
and economical to administer, closely re- 
lated to aviation and which produced 
results capable of being interpreted in a 
straight-forward manner. In addition, the 
following considerations were utilized. 

The role of general intelligence in 
flight success has been widely recognized 
(17), and the Test of Mental Ability of 
the C.A.A. fulfills the requirements of 
being a reliable instrument which can 
be used economically from the point of 
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view of both time and money. A test- 
retest measure reported for applicants for 
primary and secondary war service train- 
ing under the C.A.A. is .79 (21). In ad- 
ministering the test, a twenty-minute 
time limit was used, and raw scores were 
utilized in statistical computation. 

Since one study at least (10) has indi- 
cated that there is some relationship be- 
tween grades in college and success in 
flight training, it was decided to include 
some such measure in the current bat- 
tery. The N.R.C. Flight Training Course 
at the Ohio State University was open to 
non-students as well as to students; there- 
fore, scholastic grades were not available 
for all individuals taking part in the 
study. However, the Ohio State Psycho- 
logical Examination was available, and 
this test has been shown to have a cor- 
relation of .606 with first semester scho- 
lastic grades for male students (24). Ac- 
cording to a verbal report made by Dr. 
Herbert Toops, author of this test, the 
reliability coefficients which have been 
calculated for this test are approximately 
.g4. Grades are expressed in percentiles, 
based upon current norms at the Ohio 
State University. Percentiles were used in 
the statistical portion of this study. 

The Test of Aviation Information (Al) 
was developed in 1941-43 at Wesleyan 
University and the University of Roches- 
ter, under the sponsorship of the Com- 
mittee on Selection and Training of Air- 
craft pilots, NRC. Preliminary work done 
on this test indicates that “it can be in- 
cluded as one of the more promising pre- 
dictors developed in the Committee’s re- 
search program” (25). The test contains 
a total of 200 questions concerned with 
aviation. The raw score was used in all 
statistical computations. Reliability co- 
efficients of .751 and .771 have been re- 
ported for this test (21). 
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The Biographical Inventory is other. 
wise known as the Inventory of Personal 
Data for Prospective Pilots (27). Regard. 
ing this inventory, Viteles has said (25): 

The Biographical Inventory represents on 
of the first, if not the first, successful attempt 
to predict pilot proficiency from biographica| 
data. 

Also, it should be remembered that 
biographical data have often been used 
in informal interviews. Psychiatric 
examination of candidates for flight 
training often includes questions dealing 
with the individual’s past history. John 
son (8) has pointed out that biographical 
data, if properly used, are extremely valu. 
able in the selection of individuals for 
aeronautical training. 

A newly created Civilian key (The 
Kelly Positive Key) was used in scoring 
this inventory, and raw scores were used 
in the computation of statistical results. 
Reliability coefficients of .525 and_.603 
have been reported previously for this 
inventory (21). 

Since, in any learning situation the 
factor of motivation is most important, 
it seemed desirable to include some meas 
ure of the student’s interest in learning 
to fly. The Desire to Fly Inventory (12) 
was developed at the University of 
Rochester, and preliminary work on it 
has indicated that it has practical useful: 
ness in a battery of predictors for success 
in flight training. It contains a total of 
235 questions pertinent to interest in fly- 
ing which are to be answered by the 
applicant. Key A B was used in scoring, 
and raw scores were utilized in the cur 
rent investigation. 

Although no reliability coefficients are 
available for the Desire to Fly Inventory, 
the authors indicate that it is reliable in 
their report to the National Research 
Council (12). In this report they state: 
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An analysis of the distributions of items 


> answered “No” by various percentages of the 


populations of both samples A and B indi- 
cated that the items are fairly stable in the 
sense that each was answered “No” by ap- 
proximately the same proportion of cases in 
each sample. 

The Mashburn Serial Reaction Ap- 
paratus is one Of the older psychomotor 
tests still being used in pilot selection. 
Actually the test measures the individ- 
ual’s ability to make rapid eye-hand-foot 
reactions. A complete description of this 
apparatus is available in McFarland’s re- 
port No. 34 for the Airman Development 
Division of the C.A:A. (15). The total 
time required for making a series of forty 
eye-hand-foot responses was obtained and 
used in later statistical work. In the 
Boston-Midwest Study (20) reliability 
coefficients of .53, .74, and .74 were found 
for this test when three different samples 
were used. 

The Two-Hand Coordination Test was 
developed from two tests formerly used 
in industrial selection, namely, the Wis- 
consin Miniature Engine-Lathe Test, and 
the Farmer-Chamber’s Coordination 
Test. A complete description of the cur- 
rent version of the test is available (16). 
On this test the subject is given six trials 


= and is scored on the percentage of time 


he maintains contact between two mov- 
ing discs. The preliminary studies on this 
test indicate that the mean of six trials 
was found to correlate .78, .87, .80 with 
trials 4, 5, and 6. Therefore, the mean 
score for each applicant was used in this 
study. Reliability data are available from 
the Boston Midwest Study for three sam- 
ples and were found to be .75, .50, and 
80 (20). 

The Test of Mechanical Comprehen- 
sion (MC) contains sevénty-six questions 
with companion diagrams concerning 
mechanical problems. Reliability co- 


efficients of .697, and .743 have been re- 
ported (21). 

In view of the fact that, in the past, 
tests involving reaction time have shown 
relationships greater than chance to fly- 
ing success, it seems that any battery of 
tests constructed for the purpose of pre- 
dicting success in flight training should 
be heavily weighted with tests involving 
reaction time. Therefore, in the present 
investigation, a new test was included, 
which, although involving a measure- 
ment of reaction time, was improved 
from the points of view of economy of 
construction, and of simplicity of admin- 
istration and scoring. This has been 
designated as the Judgment-Reaction 
Test. 

The Judgment-Reaction Test was 
based upon an apparatus originally de- 
signed by Ranschburg and called by him 
a “Mnemometer.” For purposes of the 
present experiment the basic apparatus 
was modified and improved. As used in 
this research, the test consists of a black 
wooden box approximately 15 inches 
square in which the revised Ranschburg 
rotation mechanism was placed. The 
front surface of the box contains an 
aperture through which the stimulus is 
visible and two hand switches which are 
operated by the subject. 

Stimuli consist of sixty shades of Her- 
ing grays so arranged on a‘cardboard disc 
that as the correct switch is’ pressed by 
the subject, the disc moves one-sixtieth 
of its circumference and reveals a new 
stimulus through the aperture. The sub- 
ject is required to make a judgment as to 
whether the shade of gray which has be- 
come visible is darker or lighter than the 
one which was visible previously: If it is 
darker, he must press the switch on his 
right; if it is lighter, he must press the 
switch on his left. If he makes an incor- 
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rect response, the stimulus does not 
change. Only the preferred hand is used. 

The subject is required to make a 
series of sixty judgment reactions and is 
scored on the quickness with which he 
completes the series. He is given a series 
of six trials of sixty judgments each with 
a rest pause of one minute between trials. 
In preliminary experiments done with 
this test, a correlation of .go was found 
between trials 4 and 5. This was taken 
as the point at which the effects of initial 
practice were at a minimum and also as 
an indication of test-retest reliability. 
The time required to complete sixty 
judgment reactions in the fifth trial was 
used as the score for the subject in this 
investigation. 


CRITERIA USED IN THE PRESENT 
INVESTIGATION 

The criteria of success in flight train- 
ing available in this study were divided 
into two main types: gross criteria, or 
measures of over-all success or failure in 
flight training; and specific criteria, such 
as observations of good or poor per- 
formance on specific aspects of flight per- 
formance. These are not mutually ex- 
clusive for it is entirely possible that a 
student might perform above average on 
most maneuvers, yet not be able to com- 
bine their performance into a smooth pat- 
tern and, thus, be rated as a ‘“‘poor”’ flier. 
Still again, a student might be able to 
perform maneuvers and specific aspects 
of flight well, yet be labeled an “unsafe” 
pilot. On the other hand, a student might 
successfully pass his private flight exami- 
nation by giving an overall good per- 
formance, although inspection of his 
individual grades might reyeal that he 
was better in some aspects of flight per- 
formance than in others. 

There is also a large variation in the 


student’s performance from day to day 
so it seemed advisable that any data cok 
lected on his performance should 
sampled over a period of time in orde 
to obtain as accurate a picture of hi 
flight performance as possible. 

Furthermore, there is the ever presen, 
difficulty of unreliability of criteria ob. 
tained from ratings. One way by whic 
it is possible to minimize the error due w 
individual ratings is to combine rating 
or to secure ratings from different ob 
servers whenever possible. 

Another consideration in the selection 
of criteria, especially in a study such a 
this one involving flight performance, i 
the practicality of collecting observation 
This factor necessarily limited the type 
of observations which could be collected. 

Keeping in mind these principles, ai 
terion data were obtained from fow 
sources: 

1. The C.A.A. Flight Inspector 

2. The student’s instructor 

3. The check flight pilot 

4. The student’s log book 

Two measures were available from th 
C.A.A. Flight Inspector. These were th 
overall grades on the private pilot tea 
and the demerit score given on the sti 
dent’s landing performance. 

The C.A.A. grade on the private pildl 
test represents the inspector’s opinion @ 
the student’s skill in piloting. It deter 
mines, in part, whether or not the st 
dent is issued the private pilot’s licens 
Normally, this test, which comprises ! 
series of maneuvers prescribed by the Civ 
il Aeronautics Administration, does 1 
take place until the student’s instruct 
has recommended him as being ready {a 

the private pilot’s license. However, i! 
the experimental program, all studeni 
received this test after they had com 
pleted thirty-five hours of flight training 
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Previous research has shown that the 
inspector’s grade on this test has a sub- 
stantial relationship to some of the pre- 
dictors which were used in the current 
research (10). 

It was mentioned previously that in 
this study the attempt was made to sam- 
ple criteria of two types, gross and spe- 
cific. Since landing is one of the most 
complex and difficult of all maneuvers 
encountered in learning to fly, it was 
selected to represent the specific criteria. 
The following quotation from an Army 
Air Forces article expresses well the 
reason for stressing the student’s ability 
to land a plane (28). 


Experts are of the definite opinion that 
landing in the proper place in the proper 
attitude without dropping the plane in or 
bouncing it involves important aspects of 
flying skill, namely, the ability to judge space 


® and plan a course through it, to control the 


attitude and airspeed of the plane, and to 
feel when it is about to stall. 


One requirement of the private pilot 
test is that the student attempt three spot 
landings; that is, attempt three times to 
set the plane on the ground within 300 
feet of a designated spot on the runway. 
Failure to do this in two out of three 
attempts means failure in the entire 


examination. 


According to C.A.A. regulations, scores 
on individual maneuvers such as this are 
given on a demerit basis. Demerit scores 
range from one through five, having 
values as follows: 

1. Excellent (go-100) 

2. Above average (85-90) 

3- Average (80-85) 

4. Below Average (70-80) 

5. Unsatisfactory (0-70)* 


Although the student has to attempt 
three landings, only one demerit score, 


* C.A.A. form ACA 342 
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representing the average score on his 
performance is given for landing. This 
score plus the overall grade on the pri- 
vate pilot test were selected to represent 
the C.A.A. inspector's opinion of the stu- 
dent’s flying ability. 

Five measures were available from the 
student's flight instructor and were in- 
cluded in the initial phases of this study. 
The first four of these measures repre- 
sented the instructor’s appraisal of the 
student’s skill in landing. The instructor 
kept a daily check sheet on the student’s 
performance during the flight lesson, and 
each maneuver performed was graded on 
a percentage basis, with 70 being re- 
garded as passing. In order to obtain 
landing data on each student from the 
flight instructor, the following procedure 
was devised. 

During each flight lesson students 
might make a variable number of land- 
ings. In some flight lessons, he might 
make none at all without assistance. In 
order to obtain an adequate sample of 
the instructor’s grades on unassisted land- 
ings for each student, an average grade 
was computed for landings made by the 
student during the two flights immediate- 
ly preceding each of the four check 
flights. This represented the most satis- 
factory method of obtaining instructor 
data on landings and these data are com- 
parable on a time basis to that obtained 
from the check pilots. 

Further data were available from the 
instructor in the form of ratings on a 
“Scale for Rating Pilot Competency.” 
This scale was developed at Purdue Uni- 
versity (g) and a factor analysis of the 
scale has indicated that the 14 items in 
the scale measure three distinct factors, 
tentatively called “skill,” “judgment,” 
and “emotional control.” Preliminary 
work on the scale has indicated that its 
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use differentiates between the “best” and 
“poorest” students of a large group of 
instructors from several different areas. 

There is a possibility of scoring each 
item on 40 points. Since the original ex- 
perimentation done on this scale indi- 
cated that the most reliable results were 
obtained by adding the scores on the 
three factors, and combining them into 
three separate scores, this method was 
followed in the present study. The total 
number of points on each factor received 
by a student was regarded as his score. 

The Ohio State Flight Inventory pro- 
vided a major portion of the criteria 
used in this study and comprised all of 
the data received from the check pilots. 
The OSFI, as it is commonly called, was 
the result of research initiated at the 
Ohio State University in 1939, in an 
attempt to devise a standardized rating 
technique for use in making observations 
of student pilot performance (4). The 
most recent version of the OSFI was used 
(19). 

This inventory was administered four 
times to each student during the course 
of flight instruction. It was used during 
the check flights which occurred at the 
end of seven, fifteen, twenty-five, and 
thirty-five hours of flight instruction. 
Two check pilots were used to administer 
this inventory, and these pilots flew with 
each student on alternate check flights. 
The inventory was filled out during the 
flight so that omissions due to forgetful- 
ness or oversight might be kept at a mini- 
mum. The check pilots flew with the 
student only during the check flight, and 
thus, this opinion of the student's flight 
performance represents an independent 
estimate. 

Two types of data were available from 
the OSFI. These are referred to as de- 
merit scores and maneuver grades. De- 
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merit scores refer to scores based upon 
individual aspects of a maneuver, and 
grades represent an over-all estimate of 
maneuver performance. 

Each check flight required three rat. 
ings of the student’s performance in mak. 
ing a “Final Approach and Landing.” 
Data on this maneuver were divided into 
three main categories, “Control of 
Plane,” “Precision,” and “Safety.” Each 
of these was broken down into sub- 
divisions which could be scored separate- 
ly. In this way it was possible to secure a 
detailed account of the student's 
formance on that particular landing. 
Scores on this check sheet were obtained 
in the form of demerit weights, ranging 
from one through three. If a particular 
performance was satisfactory, no score 
was given. 

For the present purposes, the total de- 
merit score for each landing was com- 
puted. Then an average was taken of the 
demerit scores on as many landings as 
the student made. In no case was more 
than one landing omitted, and this omis- 
sion was generally due to incomplete 
scoring because the check pilot had to 
take over the controls. Average demerit 
scores were available on four check 
flights for each student used in the study. 

Each landing that the student made 
during the check flight was also assigned 
an over-all grade by the check pilot. This 
grade reflected his opinion of the stu- 
dent’s over-all performance in landing 
the plane. In all check flights the check 
pilot was asked to grade the student on 
a percentage basis, using 70 as the fixed 
grade for a passing score. At each stage in 
his training, the student was compared 
with the performance expected of a certi- 
fied private pilot; in other words, he was 
graded on a fixed scale. ‘The average over- 


all grade for each landing made was com- 
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puted for each of the four check flights. 

In addition to making three regular 
landings during each check flight, the 
student was asked to execute one “Land- 
ing with a Slip.” Performance on this 
maneuver was first scored on a demerit 
basis and the total number of demerits 
was computed for each “Landing with a 
Slip,” making available a total of four 
sets of demerit scores. Over-all grades for 
this maneuver ‘were also available and 
were included. 

On the fourth check flight, at the con- 
clusion of thirty-five hours of flight in- 
struction, the student was required to 
proceed to a strange field and make a 
landing. This necessitated a somewhat 
different approach to the field since in- 
structions were to make a “power on 
landing,” whereas the other landings 
were made “power-off.”” Demerit scores 
for this maneuver were available for the 
fourth check flight only and were in- 
cluded in the criteria. 

An independent gross criterion was 
obtained by keeping a record of the 
length of time required before the stu- 
dent was allowed to solo. Although popu- 
larly, such time measures as this have 
been thought to be of significance, pre- 
vious studies have shown that this par- 
ticular measure may not be too signifi- 
cant (25). “Time to Solo” records were 
available for all students in terms of 
hours and minutes before soloing, and 
were included as criteria. 

Although the reliability of the criteria 
used in this investigation is still being 
studied in other investigations, some data 
are available which indicate that these 
criteria are suitable for use in connection 
with pilot selection. 

In a study reported by the National 
Research Council (19), correlations be- 
tween inspector’s flight grades and the 


OSFI summation score were found to 
range from .83 to .g5. This indicates a 
marked consistency between the ratings 
secured from these two sources. Ina more 
recent report, Wapner and Bakan (26) 
have attempted to demonstrate the rela- 
tionship of inspector’s grades on the 
OSFI to photographic records of the stu- 
dent’s performance. Their conclusions 
were that although, in general, the ac- 
curacy of the inspector’s rating was high, 
there was some variation in the degree 
of accuracy for certain maneuvers. How- 
ever, they feel that their study supplies 
empirical evidence to justify placing con- 
fidence in the inspector’s criterion meas- 
ures. Actually, this shows that such rat- 
ings can be accurate. And thus it seems 
justifiable to use raters who have been 
well trained in using the OSFI, in re- 
search on pilot selection. 

In an analysis done by Johnson and 
Boots (7), of ratings in the preliminary 


.phase of the GAA Training Program, it 


was found that correlations between in- 
spector’s final ratings and instructor's 
mean ratings on given maneuvers were 
low, even for the last two hours of flight. 
Furthermore, the intercorrelations be- 
tween instructor’s mean ratings for given 
maneuvers ranged from .2g9 to .g3, with 
the higher correlations tending to be be- 
tween maneuvers most frequently rated. 


RESULTS AND DISCUSSION OF RESULTS 


The procedure usually adopted when 
it is desired to evaluate the relative con- 
tribution of single tests to the predictive 
value of a battery of tests is to calculate 
regression coefficients for the tests in- 
volved. The method which is regarded as 
most satisfactory when working with 
many variables is that originally devised 
by Doolittle. A complete description of 
this method is available in Peters and 
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Van Voorhis (22) pp. 226-234. This tech- most part the higher correlations were 
nique was utilized in this study. found between grades and scores given 
Each test was selected for inclusion in _ by the same pilot on the same maneuver, 
the battery because it was believed to thus giving an indication of the reliabj- 
measure some factor or factors involved ity of these ratings. Table III, below, 
in the ability to fly light aircraft. There- shows the correlations which obtained 
fore, some measure of the relationships between demerit scores and grades on 
between the tests was necessary, and in- landings when both measures were ob. 
. tercorrelations were computed. Table I tained from the same instructor. 
mn shows the zero order correlations among All variables were coded in such a way 
if - all of the tests included in the battery that a positive correlation indicates a 


TABLE I 
Intercorrelations of Predictors 


Judg- Mental 
ment ability OSPE B.I. M.C,. D.F. Mash. hand 


.204 


reaction 

judgment reaction -353 .289 —.005 —.047 -228 - .270 
ental ability -744 -420 —.017 —.190 -376 
OSPE -439 .062 .285 —.079 383 398 
A.J. -395 -419 —.208 -168 276 


Mash. 
Two-hand 


of predictors. In general, these correla- positive relationship between proficiency 
tion coefficients were not high, indicat- in both variables, with the exception of 
ing that each test was fairly independent the C.A.A. landing grade. Negative cor- 
of the others. The major exception to relation in this case indicates a positive 
this was the correlation of .744 between _ relationship in proficiency. 
the O.S.P.E. and the test of Mental Abil- The correlation between the two meas- 
ity. This might have been expected, how- ures from the C.A.A. Inspector's flight 
ever, from the nature of the tests. Next test was —.512. A relationship such as this 
to this was the correlation of :625 be- might be expected as the landing grade 
tween the two coordination tests, and is one of the most important deter- 
this, too, might have been expected from minants of the over-all grade. Evidence 
the nature of the two tests involved. that a positive relationship does obtain 
Table II shows the intercorrelations between these two grades is furnished in 
among the criteria. As in the case of the a study done on the analysis of Inspec 
predictors, the intercorrelations were, in  tor’s ratings by Festinger (5) in which it 
general, low. The highest correlation was was found that among the most uni- 
.725, between the same instructor's formly high correlations between mean 
“scores” and “grades” on landings with maneuver grades and over-all grades on 
a slip on the first check flight. Sixteen of the private pilot flight test were those 
the correlations were above .50. For the obtained from precision landings. These 


.013 
M.C. .087 -411 . 526 
a -145 -067 
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TABLE Ill 


Correlations Between Demerit Scores 
and Grades on Landings 
(Same Instructor) 


Landing 
grades 
First check -409 
Second check . 580 
Third check -634 
Fourth check .662 


correlations are given below: 

Inspector C Inspector D 
Overall: Landing 68 N= 27 .65 N= 28 
Overall: Landing .57 N= 28 .73 N= 28 

Although two of the factors contained 

in the Purdue Rating Scale showed low 
positive relationship with one another, 
the correlation between ratings on “Emo- 
tional Control” and “Skill’ was excep- 
tionally high. It is quite possible that an 
emotionally stable individual would not 
be tense in handling the controls of a 
plane, and it is well known that much of 
the flight instructor’s time is spent in 
attempting to get students to relax. 
Therefore, it is probably to be expected 
that ratings in skill and emotional con- 
trol would be positively related. 
_ The correlations between the demerit 
scores given on “Landings” in the four 
check flights were not high, indicating 
that probably the factors involved were 
fairly independent of one another. It is 
probable that landing performance 
actually changes as the student has more 
flight instruction. 

The correlations between demerit 
scores on “Strange Field Landings” and 
other ratings were low with the exception 
of those between demerit scores and 
grades on “Landings with a Slip” in the 
fourth check flight. Actually this affords 
an indication. of the reliability of the 
ratings as “Strange Field Landings” were 
administered only in the fourth check 
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flight. Thus the same individual was tr. 
sponsible for these two ratings. The high 
correlation might also indicate that the 
same type of abilities are involved in 
making landings with a slip and strange 
field landings. 

The correlations between instructor’ 
grades on landings were all positive but 
not high. The highest was a correlation 
of .539 between the average grades on 
landings preceding the second and third 
check flights. The highest correlation be. 
tween instructor’s grades and any of the 
other criteria was one of .593 obtained 
between instructor’s grades on “Land- 
ings” preceding the second check flight 
and “Time to Solo,” It would appear that 
favorable ratings on landing performance 
are related to the time when the instruc- 
tor gives the student permission to solo. 

“Time to Solo” showed a fairly high 
correlation with the C.A.A. inspector's 
over-all grade, and a positive relation- 
ship with the C.A.A, landing grade. 
“Time to Solo” is most highly correlated 
with the instructor’s rating of skill on 
the Purdue scale. This again. suggests 
that the instructor actually does allow a 
student whom he feels is skillful, to solo 
early in the flight course. In general, this 
time measure showed fairly high correla- 
tions with the other criteria. 

The intercorrelations between check 
pilot’s grades on landings in the four 
check flights were all positive with the 
exception of ratings on the fourth check 
flight. In fact, the correlations between 
grades on landings in the fourth check 
flight show a progressively inverse rela- 
tionship to grades on the first, second, 
and third check flight. 

This is quite different from what 
might have been expected, since, it seems 
more logical that grades on landings 
should show more relationship to one 
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TABLE V 
Correlations of Predictors with Selected Criteria 


Slip Grades 


Slip 
Demerits 


Instruct. OSFI 
Land Grades 


Land Grades 


OSFI 
Land Scores 


Emot 
Control 


C.A.A. 


Predictors 


Land 


Overall 
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another as the time of flight instruction 
increases. Unreliability of the rater 
might be offered as one explanation, al. 
though chance might also be responsible 
for what appears to be a progressively 
inverse relationship. It is possible that 
variation in weather conditions con. 
tributed to this relationship. 

Check pilots and instructors have been 
unanimous in stating that one should not 
expect much agreement between ratings 
on one check flight and those obtained 
from any other because of the vast varia- 
tion in the air conditions under which 
the student must fly. What might serve 
as a satisfactory performance on one day 
might be totally inadequate when 
weather conditions are different. 

Table IV, shows the correlations be- 
tween the predictors and each of the 
criteria. A cursory examination of this 
table indicates that once again none of 
the correlations were extremely high. 
Some criteria showed no significant cor- 
relations with any test in the battery so 
at this point an arbitrary decision was 
made. This decision was to omit from 
further consideration any criterion which 
did not have a correlation of at least .340 
with any test in the predictor battery. 
This reduced the number of criteria fi- 
nally selected for study to thirteen. 

In Table V, are the correlations ob- 
tained between the individual tests in the 
predictor battery and each of the criteria 
selected for further study. 

None of the correlations between the 
tests and the criteria were high. The 
largest number of comparatively high 
correlations were obtained between tests 
in the predictor battery and the C.A.A. 
over-all grade. The Biographical Inven- 
tory showed the largest number of com- 
paratively high correlations with the 
various criteria, and appeared to be the 
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most useful predictor as far as data from 
the check flights were concerned. 

The intelligence tests were not high- 
ly correlated with the criteria, and 
the Judgment-Reaction Test generally 
showed low positive correlations. Both 
the Test of Aviation Information and 
the Test of Mechanical Comprehension 
were correlated most highly with cri- 
terion data obtained from the private 
pilot test, although the correlations with 
other criteria were low. The Desire to 
Fly Inventory was likewise correlated 
highly with the C.A.A. Inspector’s data, 
but its highest correlation was with the 
rating of Emotional Control. Of the two 
coordination tests included in the bat- 
tery, the Two-Hand Coordination Test 
appeared to have the most significant 
relationship to the criteria. 

Generally, the correlations showed 
considerable variation and little predic- 
tive significance can be attached to them. 
Although the criteria used differed from 
those in the present research, much the 
same type of results were found in the 
Boston-Midwest Project when correla- 
tions were computed between individual 
predictors and criteria (20). 

Table VI shows the partial regression 
coefficients which give the relative 
weighting of the nine factors measured 
by the test battery in predicting the 
various criteria. The line second from 
the bottom of the table contains the 
multiple correlations, which were ob- 
tained between the test battery and each 
criterion. The bottom row contains the 
corrected multiple correlation  co- 
efficients. 

In general, no one test was weighted 
consistently more heavily than any of the 
other tests in predicting the criteria. As 
far as criteria obtained from C.A.A. In- 
spectors were concerned, three tests had 


Slip Grades 


Slip 


Dem. 


Instr. OSFI 
Land Grades Land Grades 


Beta Weights and Multiple Correlations 
Land Scores 
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Emot. 
Cont 


Land 


Note: Double asterisk indicates that the correlation is significant at the one per cent level. Single asterisk indicates significance at the five per cent 
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the heaviest weighting. These were The 
Test of Aviation Information, the De- 
sire to Fly Inventory, and the Two-Hand 
Coordination Test. Although all the tests 
were heavily weighted for some of the 
criteria, the weighting was not consistent. 
With the possible exception of the Mash- 
burn and the Ohio State Psychological 
Examination, the test battery showed 
promise of value in the prediction of 
the criteria when the weights were prop- 
erly adjusted for each test in the bat- 
tery. - 

The Judgment-Reaction Test proved 
to be more important in predicting cri- 
teria based on performance of specific 
landing maneuvers than on the over-all 
grade given by the C.A.A. Inspector. The 
implication is that whatever factors are 
measured by the test are involved in 
landing a plane. 

Examination of the multiple correla- 
tions at the bottom of Table VI shows 
that the test battery predicted perform- 
ance on the C.A.A. private pilot exami- 
nation far better than it predicted per- 
formance on any of the other criteria. 
This was the only correlation which was 
significant at the one per cent level. 
However, seven of the correlations 
proved to be significant at the five per 
cent level. 


DISCUSSION OF RESULTS 


One of the purposes of this research 
was to examine the possibility of analyz- 
ing the factors involved in learning to 
fly. It appears from an examination of 
these results that, at least for the group 
of students studied, and using the cri- 
teria which were available in this study, 
it is possible to discover what some of 
these factors are. To discover all of the 
factors involved would be a task far 
beyond the scope of the present investi- 
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gation in which the desire was only to 
demonstrate that such a procedure was 
feasible. 

In general, the corrected correlation 
coefficients were not sufficiently high to 
warrant specific conclusions regarding 
the predictive value of this battery jf 
used with another sample. 

It was thought that perhaps the test 
battery would predict more successfully 
at different stages in the student's flight 
training. In other words, it might predict 
early or ultimate success’ but, perhaps, 
not both. Its most successful prediction 
was for the over-all rating at the end of 
the course. Other than this, there is no 
basis for assuming that the battery was of 
more value in predicting performance at 
one stage of flight training more than at 
any other. In addition, there was no con- 
sistency in the weighting of the various 
tests at different time intervals through- 
out the flight course. Furthermore, it 
must be remembered that the private 
pilot test was the only measure which 
included ratings on more than one type 
of performance. 

There was no consistent relationship 
between the accuracy of prediction and 
the number of scores involved in the 
rating. For example, ratings on Emo- 
tional Control were made only once, and 
a multiple correlation of .634 (significant 
at the 1% level) was obtained between 
the test battery and this criterion. Rat- 
ings by instructors on landing grades 
were based on averages of a number of 
different landings, yet on the sample of 
landings immediately preceding the third 
check flight, the multiple correlation was 
only .512, a correlation that might have 
arisen by chance. 

The fact that there was no consistency 
in the weighting of items for the various 
criteria offers several possibilities. First 
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of all, it may be that each maneuver, 
even though it is a landing maneuver, 
actually is a different performance. This 
might explain differences in weighting 
between regular landings and landings 
with a slip. However, on regular land- 
ings, there were different weightings for 
the various tests. Furthermore, landings 
in different check flights were weighted 
differently in spite of the fact that in- 
structors were supposedly rating on a 
fixed scale. 

Differences in weights were more ap- 
parent when different raters were con- 
cerned than when the same rater was 
involved. This finding is in accord with 
what has been reported previously in this 
investigation, mamely, that there was 
little agreement between different raters’ 
ratings of similar performances. 

Further - evidence that instructors 
found it difficult to adhere to a fixed scale 
in grading performance is given by the 
differences in weighting of items in the 
instructor’s landing grades taken before 
the first and third check flight. This dis- 
crepancy is not so apparent in check 
pilot’s ratings. Such observations are im- 
portant for the-check pilot is in a posi- 
tion to be more objective in his grading 
than is the instructor who must ride with 
the student every day. 

It is quite possible, however, that much 
of the inconsistency between ratings on 
what appear to be similar maneuvers, 
may be due to weather variations. For 
example, in quiet air a student might 
receive a satisfactory grade on landing 
performance, yet if he were to repeat the 
same performance in rough air, his grade 
would be lower. The landing perform- 
ance in rough air would involve quite 
different skills than the landing made in 
still air. This suggests that some method 
should be devised whereby variations in 
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weather conditions could be included in 
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the rating sheet. 

It is impractical to state that there is 
but one criterion of ability to pilot light 
aircraft. For a number of years, research 
on pilot selection was based upon a pass- 
fail criterion, but this has proved to be of 
less and less value (25). Rather than to 
predict success or failure on such a gross 
basis, a successful battery should have use 
in predicting performance on specific as- 
pects of flight performance, and it was one 
of the purposes of this research to demon- 
strate that certain aspects of flight train- 
ing might be predicted by a test battery. 
It appears that the abilities required for 
the performance of different maneuvers 
are themselves different. Therefore, one 
of the uses for this test battery would ap- 
pear to be in demonstrating students’ 
weaknesses to the instructor before any 
lessons were given. Furthermore, should 
a flight course be designed in such a man- 
ner that certain specific aspects of flight 
performance were to be stressed (rather 
than over-all ability to fly), a battery 
such as this would be useful in deter- 
mining applicants’ fitness for the course. 


SUMMARY AND GONCLUSION 


In this research, a battery of nine 
psychological tests was administered to 


thirty-seven male flight students. Six of . 


these tests were paper and pencil tests 
and three were psychomotor tests. Rat- 
ings on various aspects of flight per- 
formance during a course. in primary 
flight training were available for the 
sample studied. Intercorrelations among 
all of the variables were computed, and 
regression weights and multiple correla- 
tions were computed for thirteen -se- 
lected criteria. 

On the basis of the results obtained 
in this study, it may be stated that: 
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1. It was possible to assemble a battery 
of tests which would be of use in analyz- 
ing the factors involved in learning to 
fly light aircraft. Those factors, as meas- 
ured by tests, were found to be: 


Intelligence 

Aviation Information 

Biographical Information 

Mechanical Comprehension 

Desire to Fly 

Two-hand Coordination 

Ability to make Serial Reactions (Eye- 
hand-foot) 

Ability to make rapid judgment 
reactions. 


2. The amount of information about 
aviation which the flight student has at 
the beginning of the course seems to be 
highly ,related to the grade he receives 
on the private pilot flight test. For dif- 
ferent criteria, however, the various fac- 
tors measured by the test battery must be 
weighted differently. 

3. Although this test battery produced 
multiple correlations ranging from .512 
to .707 with thirteen separate criteria, 
the results indicate that the test battery 
was most successful in predicting when 
the criterion used was an over-all rating 
of flight performance, rather than a rat- 
ing of performance on a_ specific 

- maneuver. 

4. Some writers (8) have expressed 
doubt regarding the effectiveness of 
psychomotor tests in the prediction of 
flight success. Within the limits of this 
study, it is possible to say that they can 
be used for this purpose in combination 
with paper and pencil tests. 

5. A new test was included in the bat- 
tery used in this study. This test was 
constructed on the hypothesis that it was 
possible to construct a test which would 

measure factors related to a specific flight 
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maneuver, in this case, landing. In 4j 

but one case, this test contributed to th 
prediction of performance on thi 
maneuver. It was much more related jy 
the performance of specific maneuver 
than to the performance of a series 
flight maneuvers combined into a flight 
test. It is possible that revised method 
of scoring this test might produce mor 
fruitful results. 

6. In all cases, the use of a battery of 
tests was of considerably more predictive 
value than any one of the tests used 
alone. 

7. It was found that trained rater 
differ from one another in the factor 
which they stress in evaluating perform. 
ance, and in any future study attempting 
to evaluate performance in flying, care 
should be taken to train raters even more 
thoroughly in the techniques of rating, 
Furthermore, some indication of the 
weather conditions under which each 
maneuver was performed would be of 
value in evaluating a student’s perform 
ance on the same maneuver under dif: 
ferent weather conditions. 

8. Since ratings made by individual 
not having constant contact with flight 
students seemed to be somewhat more 
consistent than ratings made by flight 
instructors who were in constant contact 
with the student, it seems desirable that 
relations between the rater and his sub- 
ject should be kept on as objective a 
basis as possible. 

g. It might be well in future research 
to use over-all ratings obtained from sev- 
eral sources. In this study, only one over- 
all measure was available. The use of 
measures obtained from various sources 
would make it possible to investigate the 
possibility that this same battery would 
predict as successfully, or more success 
fully, for other over-all ratings. Further- 
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more, it would seem desirable in future 
work to use more varied types of specific 
criteria rather than to study only landing 
rformance. The criteria should be as 
independent and objective as possible. 
io. A further suggestion for future re- 
search would be to submit all of the tests 
included in the battery to such a test 
selection technique as the Wherry- 
Doolittle, in order to ascertain if certain 
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of the tests which were found to be un- 
important in some of the criteria, might 
be eliminated from the battery. 

11. It is suggested that a battery such 
as this might be useful in indicating 
students’ potential weaknesses to the in- 
structor so that these students might be 
eliminated before training, or, if pos- 
sible, instruction might be directed 
toward overcoming these weaknesses. 
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. study was undertaken in an ef- 
fort to improve the prediction of 
success in learning to pilot light aircraft. 
It featured a new psychomotor test 
which, it was hoped, would contribute 
materially to the solution of this timely 
problem. 


STATEMENT OF PROBLEM 


Procedures in this investigation were 
planned to test the hypothesis that the 
ability to perceive and react differentially 
to visual cues, randomly sampling the 
visual field, is related to the piloting of 
light aircraft. This hypothesis was based 
on several considerations: 1. the failure 
of simple reaction time as a predictor of 
success in learning to fly; 2. the com- 
parative success of complicated reaction 
time as a predictor of flight performance; 
g. the observations of the writer, and of 
other fliers that, (a) correct responding 
within reasonable time limits seemed 
more important in piloting than mere 
quickness of reaction, and that (b) many 
of the responses required for successful 
piloting had to be made to indirect visual 
cues perceived as an inseparable part of 
the total configuration. 


THE INDIRECT VISION TEST 


The picture on the following page will 
be helpful in understanding the Indirect 
Vision Test Situation. The subject was 
seated on the adjustable stool in front of 
the beaver-board screen. Behind the 
rectangular opening in the center was a 
tachistoscope that presented a series of 
simple discrimination problems. This 
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constituted the foveal unit. Behind the 
small round openings arranged in con- 
centric circles around the center were 
hooded incandescent bulbs which could 
be activated in random order. These con- 
stituted the parafoveal unit of the 
apparatus. 


The “Instructions to the Subject” 
further reveal the general nature of the 
indirect vision test: 

This is a test of your ability to see and 
react to flashes of light at different points in 
the visual field. When a light flashes on the 
screen in front of you, you are to place the 
stick in the corresponding notch in the small 
board directly in front of you. (Demonstrate) 

Use the right hand to move the stick. Hold 
it in any manner that is convenient for you, 
but do not grasp it too tightly. Return the 
stick to the center position immediately after 
each response so that you will be ready for 
the next one. 

To help you keep looking straight ahead 
throughout the test, you will look through 
the small lighted opening in the center of 
the screen. Behind this opening will appear 
successive rows of three squares. When the 
X appears in the middle square, you are to 
push the button on the table to your left. Use 
the left hand. Keep the left arm resting on 
the table and a finger on the button so as to 
be ready when the X appears in the middle 
square. 

You can ask about anything that you don’t 
understand. 

You will be given a practice run of about 
one minute, after which you will be given 
the opportunity to ask about anything that 
it not clear to you. 

The test run will continue for about ten 
minutes. 


The sole intended function of the 
foveal task was that of insuring uniform 
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INDIRECT VisION FRONT VIEW 


fixation during the test and between indi- 
viduals tested. The problem was reduced 
to a level of facility that gave, with very 
rare exceptions, perfect performances. No 
record was made of responses to the 
foveal stimuli, but a buzzer signaled that 
a response was being made. 

The foveal unit of the apparatus con- 
sisted of a weight-driven, pendulum- 
controlled, rotating-drum tachistoscope. 
The tachistoscope was placed behind the 
screen and the successive stimuli were 
observable to the subject through a small 
rectangular opening (1” x 2”) in the cen- 
ter of the screen, The drum rotated and 
gave a new exposure every 2 seconds. 
Illumination of the foveal stimulus was 
provided by a 20,watt incandescent bulb, 
hooded, and mounted two inches above 
and three inches in front of the rotating 
drum, 


The cofistancy of eye-position was con- 
trolled by an adjustable stool and a hori- 
zontally fixed headrest 26 inches from the 
screen. 3 

The stimuli for the indirect vision dis- 
crimination task were flashes of light 
through twenty-one openings in the 
screen. These openings, three quarters of 
an inch in diameter, were distributed in 
circles around the center of the screen at 
such distances as to give visual angles of 
47°, 53°, and 48°. Because of the inter- 
ference of the response selector the bot- 
tom column of lights was deleted, leaving 
twenty-one openings. 

The light flashes were produced by 
successively activating one or another of 
the 15 watt bulbs mounted and hooded 


behind the openings. An alternating cur- 


rent of 10 volts. provided the energy for 


the flashes. The rate and duration of the 


UNIVERSITY OF MICHICAN LIBPApIE 


e) 
id 
he 
er 
or 
ad 
of 
‘ar 
he 
to 
se 
on 
to 
lle 
nt : 
ut 5 
en 
iat 
en = 
be 
m 


20 


flashes were held constant by a high- 
fidelity electric motor, operating on 
alternating current. The flashes occurred 
every 2.2 seconds and the bulbs were 
activated for a period of .g of a second. 
The order in which the lights flashed 
was controlled by an electromagnetic ro- 
tating circuit selector, the pattern having 
been determined by random selection. 

A correct response consisted in placing 
the stick, within 1.1 seconds, in the notch 
corresponding to the direction of the 
light flash. Correct responses were auto- 
matically recorded on an electromagneti- 
cally operated pen-polygraph. A buzzer, 
placed in parallel with the recording 
magnet, signaled the subject that a cor- 
rect response had been made. 

The test ran for a continuous series of 
12 cycles of the stimulus pattern. Each 
cycle flashed 25, lights which made a total 
of goo discrete stimulations. Cycles were 
recorded by an electric counter and were 
automatically marked on the polygraph 
tape. 

The measure taken as the score was 
the total number of errors or lights 
missed. 

The screen was of light brown beaver- 
board, and the general lighting for the 
test room was provided by an exposed 
25 watt light bulb, suspended 314 feet to 
the rear and 5, feet above the eye-position 
of the subject. No daylight was used. 

The reliability of the indirect vision 
test was checked by the split-half and by 
the test-retest methods. Scores on odd 
numbered cycles correlated with scores 
on even numbered cycles .g25, (N = 265). 
The test-retest coefficient of reliability 
was .784, based on the scores of 130 sub- 
jects who took the test a second time 
after a one-week interval. A likely inter- 
pretation of the discrepancy between the 
coefficients is that the difference is large- 
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ly due to personal factors which vary 
from one week to the next. The split. 
half technique would be expected to 
render a close approximation of the true 
internal consistency of the test, by more 
closely controlling the subjective factors. 
Nevertheless, it is the lower reliability 
measure that operates in the determining 
of validity. 

The wide spread distribution of scores 
on this test suggests that the individuals 
tested differed markedly in whatever 
ability or combination of abilities was 
being used. 


OTHER TESTS USED 

The design of the experiment was ex- 
tended to permit observation of the 
operation of the Indirect Vision Variable 
when combined with a selected group of 
pre-flight tests. The tests selected for study 
in connection with the Indirect Vision 
Test were: 

1. The Self-Administering Test of Mental 
Ability (Gamma A. M. Otis Quick- 
scoring) 

2. Test of Mechanical Comprehension 
(Form B, C.A.A.) 

3. Desire to Fly (Form XPA) 

4. Test of Aviation Information (Form P) 

5. Two-Hand Coordination Test 


These tests are a sampling of the better 
paper-pencil and psychomotor predictors 
now available. The study by Lane shows 
that all of the old tests in this battery 
have some relationship to various meas- 
ures of pilot success. (13) 


THE CRITERIA 

Our chief interest in this study was the 
student's ability to fly at or near the end 
of the normal training period. Conse- 
quently most of the criteria selected for 
use were measures and ratings made dur- 
ing this later period of training. 

The only specific maneuver separately 


& Indirec 
Menta 
Mecha 
Aviati 
Two-F 
used 
reaso 
sider 
ing t 
requi 
tory 
ceive 
patte 
Tl 
2. | 
3. 
4. 
5. 
C.A.A. 
Gra 
C.A.A. 
Scor 
O.S.F. 
Che 
O.S.F. 
Che 
O.S.F. 
Che 
OS.F. 
Che 
Che 
Purdu 
Purdu 
Jud: 
Purdu 
Em 


STUDIES IN PILOT SELECTION 


TABLE I 
Intercorrelations of Predictors (N = 88) 


Indirect Mental Mechanical Desire Aviation Two-Hand 
Vision Ability Compre-  toFly  Informa- Coordina- 
Test Test hension Inventory tion Test tion Test } 


Indirect Vision Test -039 .103 .264 .025 — .058 

Mental Ability Test -393 -129 .461 -45° 

Mechanical Comprehension .169 -395 -498 


Desire To Fly Inventory .068 -083 


Aviation Information Test 
Two-Hand Coordination Test 


used for validation was that of landing and Emotional Stability (Using the = 
the plane. Landings were selected for two Purdue “Scale for Rating Pilot c. : 


reasons: (1) landing is universally con- Competency”). 


sidered one of the crucial aspects of learn- A detailed discussion of these criteria is 

ing to fly, and (2) landing would seem to included in the study by Lane reported 

require the ability to make discrimina- in the first part of this monograph. 

tory responses to indirect visual cues per- 

ceived in relation to the total visual © RESULTS AND DISCUSSION OF RESULTS 

pattern. The results of this experiment. are 
The criteria included: summarized in the four tables. The cor- 
1. C.A.A. Flight Inspectors’ Overall Grades relation coefficients were computed by 
2. C.A.A. Flight Inspectors’ Demerit Scores the product-moment method from raw 

on Landings scores. Table I presents the intercorrela- 


3. Check Pilots’ Overall Grades on Check ; 
Flights 1, 2, § and 4. (OSF.L) tions of the predictors. Inter-correlations 


4. An Average of the Check Pilots’ grades mong the criteria are presented in 
on Landings, 4th Check Flight. Table II. Table III gives the validity 
5- Instructors’ Ratings of Skill, Judgment, coefficients for the predictors when used @ 
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TABLE II 
Intercorrelation of Criteria (N = 88) 


CAA. CAA. OS.F.I. OS.F.I. O.S.F.1. O.S.F.1. O.S.F.1. Purdue Purdue Purdue 
Overall Landing Overall Overall Overall Overall ere A Scale Scale Scale 
Grade Score Check #1 Check #2 Check #3 Check #4 Check #4 Skill Judgment Emotion 


583 -119 .229 +507 298 -446 «165 +270 
-291 -162 -206 -145 -145 


+470 
-385 
- 366 


.382 


‘due 
Judgment 
‘due Scale 
Emotion 
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TaB_e III 
Correlations Between Individual Predictors and Criteria (N = 88) 


CAA. CAA, OS.F.I. OSF.L OSFI. OSF.1. O.S.F.1. Purdue Purdue Purdy 
Overall Landing Overall Overall Overall Overall Land Scale Scale Scale 
Grade Score Check #1 Check #2 Check #3 Check #4 Check #4 Skill Judgment Emotio, 
Test 089 060 068 6 61* 
Test —.! —.050 2 +o 22 
+2 -. =.2 > .300%* 361 20. 
poe 7 5 7 3 3 37 3 34 
Inventory —.012 +055 -083 030 ~—.059 186 153 
Aviation Informa- 


—.129 


. 160 -210* 


level. 


* Significant at the 5 
** Significant at level. 


at the 1 


independently. Table IV shows the beta 
weights for the various tests in relation 
to each of the ten criteria. The last row 
of figures in Table IV shows the multiple 
correlations between the test battery and 
each criterion. 

The Indirect Vision Test shows a 
marked degree of uniqueness. In only 
one instance is the inter-correlation 
above .103. The .103 is with Mechanical 
Comprehension. The higher one, .264, is 
with the Desire-to-Fly Inventory. Perhaps 
those who had the greatest desire to fly 
were the most highly motivated on the 
Indirect Vision Test. 

Only one other test, the Desire-to-Fly 
Inventory, shows any considerable de- 
gree of independence. The highest inter- 
correlation for this test is the one with 


TABLE IV 
Beta Weights and Multiple Cemeletionn (N =88) 


the Indirect Vision Test indicated above. 

The four remaining tests of the 
battery, namely the Mental Ability Tes, 
Mechanical Comprehension, Aviation 
Information, and the Two-Hand (o 
ordination Test, show a considerable 
commonality. The inter-correlations for 
these tests range from .393 to .498. A 
factorial analysis is needed to apprais 
the common elements in the hope of re 
ducing the number ot tests or test items 
needed. 

A brief visual survey of Table II re 
veals that the inter-correlations tend to 
run a little higher for the criteria than 
they were in the case of the predictor. 
This was to be expected on several 
counts. In the first place four of the 
criteria, the O.S.F.I. check flights, were 


C.A.A. O.S.F.L. 
Landing Overall 


O.S.F.L 
Overall 


OS.F.1L. OS.F..1 O.S.F.I. Purdue Purdue Purdue 
Overall Overall Landi: Scale Scale Scale 
Skill Judgment Emotion 


Score Check #1 Check #2 Check #3 Check #4 Check #4 


Indirect Vision 


Test 
_ Mental Ability 
Test 


+035 
-. —.264 


dination Test —.007 —.076 
R 227 


—.049 


258 -197 


-O17 —.161 


—.163 


-131 +04 +153 


* Significant at the level 
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tion Test -195 -054 .206 .333* .079 .301** . -027 260% 
Two-Hand Coor- 
dination Test +006 -061 -138 —.009 206 -271* 
thes 
whe 
mig 
the 
reli; 
C.A.A, d 
Grade 
me! 
-087 143 -153 
skil 
prehension 382 —.025 a +120 +230 +251 +24) +049 163 
Desire to Fly het 
Inventory —.006 —.123 .025 -002 -025 —.108 +103 +102 
Aviation Informa- the 
tion Test +049 -299 .252 .176 
'wo-Hand Coor- tlio 
—.019 


repetitive, and involved two successive 
ratings by each of the check pilots. More- 
over, no serious attempt was made to 
eliminate overlapping in the selection of 
criteria. The tenuous nature of the 
criteria did not’ warrant selection for 
mutual exclusion. In one case the rating 
of a single performance is treated both as 
a separate criterion and as a factor in- 
cluded in another.,,This undoubtedly 
goes a long way toward explaining the 
comparatively high correlation (.758) be- 
tween landings of. the fourth check flight 
and the over-all score for the entire flight. 
The fourth check flight was the only one 
for which a separate landing grade was 
It is interesting and perhaps significant 
that the correlations between , check 
fights involving two ratings by the same 
check pilot for each of the subjects 
averages .402, while those, between check 
flights in which each student received 
one rating from each check pilot averages 
170, It is unlikely that differences in per- 
formances would account for more than a 
fraction of this discrepancy. The lower 
average (.170) suggests unreliability in 
these ratings, and Jeads one to wonder 
whether the higher self-consistency (.402) 
might not be due to constant errors on 
the part of the raters. Thus, not only the 
reliability of the ratings, but the validity 
as well, is open to question. 

The second highest intercorrelation 
(.692) is found between the skill and emo- 
tion aspects of the Purdue Scale. Judg- 
ment and emotion correlate .354, while 
skill and judgment show a relationship 
of .407, The halo effect. may be operating 
here, since all three aspects are rated by 
the same. instructor. A further investiga- 
tion of the nature of these relationships 
would be in order. 

Apparently one of the greatest needs in 
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aviation psychology is for more adequate 
criteria of pilot skill. The photographic 
method may contribute to the solution, 
but even this seemingly objective method 
is not without its problems. Evidence so 
far suggests that agreement between rat- 
ings and photographic records is about 
the same as between raters. As yet there 
is no clear way of establishing either of 
these methods as the one by which the 
other can be evaluated. A factorial 
analysis of the data now available is a 
very necessary next step. 

It is obvious from the data in Table 
III that no single test of the battery is 
adequate for the prediction of success. in 
learning to pilot light aircraft. Those 
correlations which differ significantly 
from, zero are marked with one or two 
asterisks. For those marked with two 
asterisks the chances are less than one in 
a hundred that such a correlation would 
be drawn from a universe where the true 
value is zero. One asterisk indicates that 
the odds are between 1 in 20 and 1: in 
100 that the true value of the correlation 
would be zero. 

On the whole the correlations are how, 
They are, on the average, a little lower 
than those reported by Lane. for the 
same tests. (13) One thing that may help 
account for this lowering of relationship 
is the fact that the scores for this study 
were taken from three quarters of train- 
ing, rather than the two used by Lane. 
Changing weather conditions and chance 
factors have had a greater to 
influence the outcomes. ' 

While the Indirect Vision Test did not 
produce according to expectation, it was 
shown to have a fairly consistent relation- 
ship with the later check pilot grades 
and the instructor ratings. The highest 
relationship (.261) is with skill, as esti- 
mated by the student's regular instructor, 
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and the second highest (.248) is with skill 
in landing, as graded by the check pilot 
on the fourth check flight. Both of these 
correlations are significant at the 5% 
level 

The Mechanical Comprehension Test 
leads the field in predictive value, with 
seven out of the ten validity coefficients 
significant at the 1% level, all being 
above .27. The second-best individual 
predictor is the Aviation Information 
Test. It has correlations significant at 
the 1% level with O.S.F.I. check flights 
two and four, skill on the Purdue Scale, 
and with the landing score on the fourth 
check flight. 

The Two-Hand Coordination Test is 
of little predictive value, except in re- 
lation to the landings on the fourth check 
flight and the three ratings on the Purdue 
Scale. The Mental Ability Test and the 
Desire-to-Fly Inventory, according to this 
study, are of little or no value in pre- 
dicting pilot success. 

It is interesting to note that the 
Mechanical Comprehension Test is the 
only one that bears a significant relation- 
ship with the C.A.A. examination 
criteria. Perhaps the validity of these 
examinations should be seriously investi- 
gated, since the granting or withholding 
of the private pilot license depends upon 
them. 

The beta weights for the Indirect 
Vision Test reveal that although the 
original validity coefficients were low 
they are used almost in toto in the final 
predictive value of the test battery. The 
greatest contribution of the Indirect 
Vision Test is in the prediction of land- 
ing ability as measured by the O.S.F.1. 
technique in the fourth check flight. It 
may well be that the future usefulness 
of this test will be highest in relation to 
this particular maneuver, involving as it 
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does a rapidly changing visual pattern, 

The Mechanical Comprehension Test 
will be seen to make the highest average 
contribution. Half of the beta weights for 
this test are above .20. It is especially 
strong in relation to the C.A.A. overall 
grade (.382). 

The three tests accounting for the 
greater part of the predictive value of the 
battery are the Mental Ability Test, the 
Test of Mechanical Comprehension, and 
the Aviation Information Test. The pre- 
dictive value of a battery comprised of 
these three tests alone would approach 
that of the six-test battery for most of 
the criteria. In any case it is recom. 
mended that in any future use of these 
data, tests showing beta weights of less 
than .10 be dropped from the computa- 
tion of R for any particular criterion. 

A survey of the multiple correlations 
as presented in Table IV shows that the 
test battery bears no significant relation- 
ship to C.A.A. landing scores or to the 
over-all scores on the third check flight. 
For performance during the first and 
second check flights the battery of tests 
would seem to have only slight predictive 
value. The R's for these check flights are 
.378 and .387. The multiple correlations 
for the remaining criteria range from 
.422 (Judgment on the Purdue Scale) to 
.489 (landings on the fourth check flight), 
all being significant at the 1% level. 


RECOM MENDATIONS 


The Indirect Vision test can be im- 
proved in several ways. In this experi- 
ment the visual cues were presented at a 
constant rate. Breaking up this rhythm 
might make the test more differentiating 
and possibly increase its validity. Again, 
by modifying the intrument to permit 
shifting of the positions of the signal 
lights, a more adequate sampling of the 
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total visual field could be obtained. 

The test might be greatly improved as 
a predictor of pilot success by using, in- 
stead of discrete signals, a continuously 
changing pattern of stimuli more analo- 
gous to that encountered in actual flight. 
A moving picture of what the pilot sees 
in flying and landing could be projected 
upon a translucent screen from the rear. 
Cues for discriminatory responding 
would need to be selected and the subject 
instructed accordingly. 

It is recommended that the improved 
test be correlated with some criterion of 
success in formation flying. With the 
rapid growth in aviation, formation fly- 
ing of some kind in and around airports 
is becoming a civilian, as well as a mili- 
tary necessity. 

The instrument devised for this study 
has possibilities as a training device, 
especially if the moving picture tech- 
nique were to be added. The stimulus 
control and response selector parts of the 
apparatus lend themselves to progressive 
complication, so that a series of tasks at 
different levels of difficulty could be pre- 
sented. 

A time saving might be effected by de- 
termining the -smallest necessary sam- 
pling on the test. Separate correlations 
with criteria for each of the cycles, might 
lead to improved validity as well as a 
saving in time. On the other hand, per- 
haps the test should be extended to dis- 
cover what effect this might have on reli- 
ability and validity. 

The apparatus can be readily modified 
to measure simple reaction time, either 
on the polygraph or with an electric 
chronometer. By eliminating the indirect 
stimuli and using only the central unit, 
simple problems in learning and remem- 
bering can be investigated. Without 
modification the apparatus could be used 
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for studies of learning in relation to 
distractions. The insertion of a rheostatic 
control in the stimulus circuit would per- 
mit the study of intensity thresholds in 
the different parts of the visual field. 
Colored stimuli could also be provided 
with a minimum of difficulty. 

It might be profitable to discover the 
value or lack of value of the improved 
Indirect Vision Test in predicting s1c- 
cess in automobile driving and in select- 
ing men for industrial positions that 
seem to involve continuous attending to 
wide visual areas. 

The prime need in the whole program 
of aviation psychology is for impvoved 
measures of success in piloting. It is much 
easier to point out the need for improved 
criteria, however, than it is to propose 
ways and means of bringing about this 
desired improvement. Already the prob- 
lem has received a great deal of expert 
attention. It would seem that a sufficient 
mass of data is available on a great 
variety of criteria as to warrant a fac- 
torial analysis as a logical next step. Such 
an analysis would enable us to spot the 
commonalities among present criteria 
and to concentrate our further efforts in 
those areas contributing most to flying 
success as it has been measured. The re- 
sults of this analysis would also be help- 
ful as a guide to test making and test 
improvement. 

The introduction of polygraphic and 
photographic techniques would seem to 
be in the direction of improved accuracy 
and objectivity. The little evidence that 
we have regarding these techniques is not 
too promising. They have the advantage 
of rendering a permanent record of what 
took place, records that can be reviewed 
again and again. But the interpretation 
of meaning of the records is still proble- 
matical. A control record made by an 
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expert pilot in the same plane, flying the 
same course, at nearly the same time, is 
probably the best basis for comparison. 
Even this technique does not take into 
account the capricious nature of minor 
air currents. 

The rating technique is still far from 
being outmoded. While it is true that 
ratings generally show an unsatisfactory 
reliability, there are ways in which the 
rating of pilot skill can be improved. The 
training of raters is one method that has 
not been fully utilized. Training can re- 
sult in a clearer understanding of the 
specific behavior being rated, and in a 
more uniform interpretation of the 
meaning of the positions along the scale. 
Multiple ratings on the same flight per- 
formance have been previously impossi- 
ble, inasmuch as practically all of the 
basic training planes have been two- 
seaters. With the rapid developments in 
light plane designing, it would be well to 
consider the possibility of having two or 
more raters fly simultaneously with the 
subject and make individual evaluations 
of the various maneuvers. 

With the large number of criteria 
sampling the student’s skill at successive 
points in the training period, one is 
prompted to raise the question as to 
whether or not some of the early criteria 
might have predictive value for later per- 
formance. The intercorrelations of the 
criteria in this study are a step in the 
direction of finding an answer to this 
question. It is recommended that the re- 
sults of this experiment, along with the 
data from Lane’s study, be used in de- 
termining the value of measures of early 
flight performance in the prediction of 
ultimate success or failure in learning to 
pilot light aircraft. 
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SUMMARY 

The experiment described in detail jn 
the preceding chapters was addressed to 
the general problem of improving pre. 
diction of success in piloting light air. 
craft. 

The specific problem was that of de. 
termining the relation between the 
ability to perceive and react differentially 
to configurational changes and the ability 
to pilot light aircraft. A test was devised 
to measure the ability to perceive the 
total visual field and to make selective 
responses to changes therein. The 
changes were provided by flashes of light 
sampling the visual field. The intensity 
and duration of these stimuli were below 
the threshold for after-images. Correct 
directional responses were recorded elec- 
tromagnetically on a pen-polygraph. A 
simple foveal task was employed to con- 
trol eye-fixation, Indirect visual cues were 
presented at the rate of one every 2.2 
seconds. To be recorded the correct re- 
sponse had to be made within 1,1 seconds 
after the stimulus occurred. Reliability 
coefficients for this test were satisfactory 
for its use. 

Scores on this test were correlated with 
ten criteria of success in flying. The 1's 
are low, but compare favorably with the 
predictive value of other pre-flight tests. 
Inter-correlations between the Indirect 
Vision Test and five more-or-less widely 
used pre-flight tests reveal a minimum of 
over-lapping. The beta weights assigned 
to the Indirect Vision Test, when all six 
predictors are used as a battery, further 
substantiate its unique contribution. 

The R’s between the battery of tests 
and the various criteria are comparable 
to those of similar batteries. They are 
sufficiently high and significant in rela- 
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tion to eight out of the ten criteria as to 
warrant use of the battery in practical 
situations, pending the development of 
better predictors. 

The correct evaluation of predictors is 
still problematic, due to weaknesses in 
the criteria of success in flying. It is possi- 
ble that the Indirect Vision Test, and the 
battery of which it is a part will show 
higher validity when related to more ade- 
quate criteria. On the other hand, the 
validity coefficients may be lowered by 
ihe improvement of criteria. 

Insofar as the criteria used in this study 
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are true measures of pilot ability, per- 
formances on the Indirect Vision Test 
-and on the test battery have something in 
common with the successful piloting of 
light aircraft. 

Suggestions have been made for im- 
proving the Indirect Vision Test and for 
investigating its usefulness in fields other 
than aviation. It has also been pointed 
out that the results of this experiment 
provide a basis for the investigation of 
additional problems in aviation psy- 


chology. 
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